Boosting Schema Matchers
نویسندگان
چکیده
Schema matching is recognized to be one of the basic operations required by the process of data and schema integration, and thus has a great impact on its outcome. We propose a new approach to combining matchers into ensembles, called Schema Matcher Boosting (SMB). This approach is based on a well-known machine learning technique, called boosting. We present a boosting algorithm for schema matching with a unique ensembler feature, namely the ability to choose the matchers that participate in an ensemble. SMB introduces a new promise for schema matcher designers. Instead of trying to design a perfect schema matcher that is accurate for all schema pairs, a designer can focus on finding better than random schema matchers. We provide a thorough comparative empirical results where we show that SMB outperforms, on average, any individual matcher. In our experiments we have compared SMB with more than 30 other matchers over a real world data of 230 schemata and several ensembling approaches, including the Meta-Learner of LSD. Our empirical analysis shows that SMB improves, on average, over the performance of individual matchers. Moreover, SMB is shown to be consistently dominant, far beyond any other individual matcher. Finally, we observe that SMB performs better than the MetaLearner in terms of precision, recall and F-Measure.
منابع مشابه
XML Matchers: approaches and challenges
Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespr...
متن کاملCalibration and comparison of schema matchers
Schemas used in various environments become more and more numerous, though they do not comply to a universal standard. That is why the task of schema matching has emerged and its main objective is to find means to map a schema into another. Several initiations have occurred and algorithms have been proposed to solve the problem. They muster highly enticing solutions, though they have several fl...
متن کاملInstance Matching with COMA++
Schema matching is the process of identifying semantic correspondences between schemas. COMA++ is a matching prototype which uses several characteristics of schemas to determine similarities between them, for example the names and data types of the schema elements and structural information. In this paper we propose two instance-based matchers for COMA++ to gain a further quality improvement. T...
متن کاملSchema Matching across Query Interfaces on the Deep Web
Schema matching is a crucial step in data integration. Many approaches to schema matching have been proposed so far. Different types of information about schemas, including structures, linguistic features and data types, etc have been used to match attributes between schemas. Relying on a single aspect of information about schemas for schema matching is not sufficient. Approaches have been prop...
متن کاملCMC: Combining Multiple Schema-Matching Strategies Based on Credibility Prediction
Schema matching, which tries to find semantic correspondences between schema elements, is a key operation in data engineering. Combining multiple matching strategies is a very promising technique for schema matching. To overcome the limitations of existing combination systems and to achieve better performances, in this paper the CMC system is proposed, which combines multiple matchers based on ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008